Certified Associate Developer for Apache Spark v1.0

Page:    1 / 12   
Exam contains 176 questions

Which of the following code blocks will most quickly return an approximation for the number of distinct values in column division in DataFrame storesDF?

  • A. storesDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))
  • B. storesDF.agg(approx_count_distinct(col("division"), 0.01).alias("divisionDistinct"))
  • C. storesDF.agg(approx_count_distinct(col("division"), 0.15).alias("divisionDistinct"))
  • D. storesDF.agg(approx_count_distinct(col("division"), 0.0).alias("divisionDistinct"))
  • E. storesDF.agg(approx_count_distinct(col("division"), 0.05).alias("divisionDistinct"))


Answer : A

The code block shown below contains an error. The code block is intended to return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Identify the error.
Code block:
storesDF.agg(mean("sqft").alias("sqftMean"))

  • A. The argument to the mean() operation should be a Column abject rather than a string column name.
  • B. The argument to the mean() operation should not be quoted.
  • C. The mean() operation is not a standalone function – it’s a method of the Column object.
  • D. The agg() operation is not appropriate here – the withColumn() operation should be used instead.
  • E. The only way to compute a mean of a column is with the mean() method from a DataFrame.


Answer : A

Which of the following operations can be used to return the number of rows in a DataFrame?

  • A. DataFrame.numberOfRows()
  • B. DataFrame.n()
  • C. DataFrame.sum()
  • D. DataFrame.count()
  • E. DataFrame.countDistinct()


Answer : D

Which of the following operations returns a GroupedData object?

  • A. DataFrame.GroupBy()
  • B. DataFrame.cubed()
  • C. DataFrame.group()
  • D. DataFrame.groupBy()
  • E. DataFrame.grouping_id()


Answer : D

Which of the following code blocks returns a collection of summary statistics for all columns in
DataFrame storesDF?

  • A. storesDF.summary("mean")
  • B. storesDF.describe(all = True)
  • C. storesDF.describe("all")
  • D. storesDF.summary("all")
  • E. storesDF.describe()


Answer : E

Which of the following code blocks fails to return a DataFrame reverse sorted alphabetically based on column division?

  • A. storesDF.orderBy("division", ascending – False)
  • B. storesDF.orderBy(["division"], ascending = [0])
  • C. storesDF.orderBy(col("division").asc())
  • D. storesDF.sort("division", ascending – False)
  • E. storesDF.sort(desc("division"))


Answer : C

Which of the following code blocks returns a 15 percent sample of rows from DataFrame storesDF without replacement?

  • A. storesDF.sample(fraction = 0.10)
  • B. storesDF.sampleBy(fraction = 0.15)
  • C. storesDF.sample(True, fraction = 0.10)
  • D. storesDF.sample()
  • E. storesDF.sample(fraction = 0.15)


Answer : E

Which of the following code blocks returns all the rows from DataFrame storesDF?

  • A. storesDF.head()
  • B. storesDF.collect()
  • C. storesDF.count()
  • D. storesDF.take()
  • E. storesDF.show()


Answer : B

Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?

  • A. [assessPerformance(row) for row in storesDF.take(3)]
  • B. [assessPerformance() for row in storesDF]
  • C. storesDF.collect().apply(lambda: assessPerformance)
  • D. [assessPerformance(row) for row in storesDF.collect()]
  • E. [assessPerformance(row) for row in storesDF]


Answer : D

The code block shown below contains an error. The code block is intended to print the schema of DataFrame storesDF. Identify the error.
Code block:
storesDF.printSchema

  • A. There is no printSchema member of DataFrame – schema and the print() function should be used instead.
  • B. The entire line needs to be a string – it should be wrapped by str().
  • C. There is no printSchema member of DataFrame – the getSchema() operation should be used instead.
  • D. There is no printSchema member of DataFrame – the schema() operation should be used instead.
  • E. The printSchema member of DataFrame is an operation and needs to be followed by parentheses.


Answer : E

The code block shown below should create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPerformance() and apply it to column customerSatisfaction in table stores. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
spark._1_._2_(_3_, _4_)
spark.sql("SELECT customerSatisfaction, _5_(customerSatisfaction) AS result FROM stores")

  • A. 1. udf
    2. register
    3. "ASSESS_PERFORMANCE"
    4. assessPerformance
    5. ASSESS_PERFORMANCE
  • B. 1. udf
    2. register
    3. assessPerformance
    4. "ASSESS_PERFORMANCE"
    5. "ASSESS_PERFORMANCE"
  • C. 1. udf
    2. register
    3."ASSESS_PERFORMANCE"
    4. assessPerformance
    5. "ASSESS_PERFORMANCE"
  • D. 1. register
    2. udf
    3. "ASSESS_PERFORMANCE"
    4. assessPerformance
    5. "ASSESS_PERFORMANCE"
  • E. 1. udf
    2. register
    3. ASSESS_PERFORMANCE
    4. assessPerformance
    5. ASSESS_PERFORMANCE


Answer : A

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

  • A. The assessPerformance() operation is not properly registered as a UDF.
  • B. The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
  • C. UDFs can only be applied vie SQL and not through the DataFrame API.
  • D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
  • E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.


Answer : A

The code block shown below contains an error. The code block is intended to use SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF. Identify the error.
Code block:
storesDF.createOrReplaceTempView("stores")
storesDF.sql("SELECT storeId, managerName FROM stores")

  • A. The createOrReplaceTempView() operation does not make a Dataframe accessible via SQL.
  • B. The sql() operation should be accessed via the spark variable rather than DataFrame storesDF.
  • C. There is the sql() operation in DataFrame storesDF. The operation query() should be used instead.
  • D. This cannot be accomplished using SQL – the DataFrame API should be used instead.
  • E. The createOrReplaceTempView() operation should be accessed via the spark variable rather than DataFrame storesDF.


Answer : B

The code block shown below should create a single-column DataFrame from Python list years which is made up of integers. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
_1_._2_(_3_, _4_)

  • A. 1. spark
    2. createDataFrame
    3. years
    4. IntegerType
  • B. 1. DataFrame
    2. create
    3. [years]
    4. IntegerType
  • C. 1. spark
    2. createDataFrame
    3. [years]
    4. IntegertType
  • D. 1. spark
    2. createDataFrame
    3. [years]
    4. IntegertType()
  • E. 1. spark
    2. createDataFrame
    3. years
    4. IntegertType()


Answer : D

The code block shown below contains an error. The code block is intended to cache DataFrame storesDF only in Spark’s memory and then return the number of rows in the cached DataFrame. Identify the error.
Code block:
storesDF.cache().count()

  • A. The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be specified to MEMORY_ONLY as an argument to cache().
  • B. The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be set via storesDF.storageLevel prior to calling cache().
  • C. The storesDF DataFrame has not been checkpointed – it must have a checkpoint in order to be cached.
  • D. DataFrames themselves cannot be cached – DataFrame storesDF must be cached as a table.
  • E. The cache() operation can only cache DataFrames at the MEMORY_AND_DISK level (the default) – persist() should be used instead.


Answer : B

Page:    1 / 12   
Exam contains 176 questions

Talk to us!


Have any questions or issues ? Please dont hesitate to contact us

Certlibrary.com is owned by MBS Tech Limited: Room 1905 Nam Wo Hong Building, 148 Wing Lok Street, Sheung Wan, Hong Kong. Company registration number: 2310926
Certlibrary doesn't offer Real Microsoft Exam Questions. Certlibrary Materials do not contain actual questions and answers from Cisco's Certification Exams.
CFA Institute does not endorse, promote or warrant the accuracy or quality of Certlibrary. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.
Terms & Conditions | Privacy Policy